Avoiding Overfitting with BP-SOM
نویسندگان
چکیده
Overfitting is a well-known problem in the fields of symbolic and connectionist machine learning. It describes the deterioration of generalisation performance of a trained model. In this paper, we investigate the ability of a novel artificial neural network, bp-som, to avoid overfitting. bp-som is a hybrid neural network which combines a multi-layered feed-forward network (mfn) with Kohonen’s self-organising maps (soms). During training, supervised back-propagation learning and unsupervised som learning cooperate in finding adequate hidden-layer representations. We show that bp-som outperforms standard backpropagation, and also back-propagation with a weight decay when dealing with the problem of overfitting. In addition, we show that bp-som succeeds in preserving generalisation performance under hidden-unit pruning, where both other methods fail. 1 On avoiding overfitting In machine-learning research, the performance of a trained model is often expressed in its generalisation performance, i.e., its capability to process correctly new instances not present in the training set. When the generalisation performance of the trained model is much worse than its performance on the training material (i.e., its ability to reproduce the training material), we speak of overfitting. Overfitting is sometimes due to the sparseness of the training material: e.g., the training material does not sufficiently cover the characteristics of the classification task. A second cause for overfitting might be a high degree of non-linearity in the training material. In both cases, the learning algorithm might not be able to learn more from the training material than the classification of the training instances itself (see, e.g., Norris, 1989). The issue of avoiding overfitting is well-known in the field of symbolic and connectionist machine learning (e.g., Wolpert, 1992; Schaffer, 1993; Jordan and Bishop, 1996). In symbolic machine learning, a commonly used heuristic to avoid overfitting is minimising the size of the induced models (cf. Quinlan’s (1993) C4.5 and C4.5rules), in the sense of the minimum-descriptionlength (mdl) principle (Rissanen, 1983). For instance, smaller (or less complex) models should restrict the number of parameters to the minimum required for learning the task at hand. In connectionist machine learning (neural networks), avoiding overfitting is closely related to finding an optimal network complexity. In this view, two types of methods of avoiding overfitting (or regularisation) can be distinguished: (i) starting with an undersized network and gradually increasing the network’s complexity (Fahlman and Lebière, 1990), and (ii) starting with an oversized network and gradually decreasing its complexity (e.g., Mozer and Smolensky, 1989; Le Cun, Denker, and Solla, 1990; Weigend, Rumelhart, and Huberman, 1991; Hassibi, Stork, and Wolff, 1992; Prechelt, 1994; Weigend, 1994). In this paper we analyse the overfitting-avoidance behaviour of a novel artificial neural-network architecture (bp-som, Weijters, 1995), which belongs to the second type of connectionist machine-learning methods. In bp-som, the network complexity is reduced by guiding the hidden-layer representations of a multi-layer feedforward network (mfn, Rumelhart et al., 1986) to simplified vector representations. To achieve its aim, bpsom combines the traditional mfn architecture with selforganising maps (soms) (Kohonen, 1984): each hidden layer of the mfn is associated with one som (see Figure 1). During training of the weights in the mfn, the corresponding som is trained on the hidden-unit activation patterns. The standard mfn error-signal is augmented with information from the soms. The effect of the augmented error signals is that, during learning, the hidden-unit activation patterns of clusters of instances associated with the same class tend to become highly similar. Intuitively speaking, the self-organisation of the som guides the mfn into arriving at adequate hiddenunit representations. We demonstrate that bp-som avoids overfitting by reducing the complexity of the hidden-layer representations. In Section 2, we provide a description of the bp-som architecture and learning algorithm. Section 3 presents experiments with bp-som trained on three benchmark classification tasks, focusing on the ability to avoid overfitting. In addition, we study the robustness of bp-som to hidden-unit pruning. Our conclusions are given in Section 4.
منابع مشابه
Self-Organizing Map in Data-Analysis - Notes on Overfitting and Overinterpretation
The Self-Organizing Map, SOM, is a widely used tool in exploratory data analysis. Visual inspection of the SOM can be used to list potential dependencies between variables, that are then validated with more principled statistical methods. In this paper we discuss the use of the SOM in searching for dependencies in the data. We point out that simple use of the SOM may lead to excessive number of...
متن کاملSelf-Organizing Maps in data analysis - notes on overfitting and overinterpretation
The Self-Organizing Map, SOM, is a widely used tool in exploratory data analysis. Visual inspection of the SOM can be used to list potential dependencies between variables, that are then validated with more principled statistical methods. In this paper we discuss the use of the SOM in searc hing for dependencies in the data. We poin t out that simple use of the SOM may lead to excessive number ...
متن کاملOvertraining and model selection with the self-organizing map
We discuss the importance of finding the correct model complexity, or regularization level, in the self-organizing map (SOM) algorithm. The complexity of the SOM is determined mainly by the width of the final neighborhood, which is usually chosen ad hoc or set to zero for optimal quantization error. However, if the SOM is used for visualizing the joint probability distribution of the data, then...
متن کاملGenerative Probability Density Model in the Self-Organizing Map
The Self-Organizing Map, SOM, is a widely used tool in exploratory data analysis. A theoretical and practical challenge in the SOM has been the diffi culty to treat the method as a statistical model fitting procedure. In this chapter we give a short review of statistical approaches for the SOM. Then we present the probability density model for which the SOM training gives the maximum likeli h...
متن کاملInterpretable Neural Networks with BP-SOM
Interpretation of models induced by artiicial neural networks is often a diicult task. In this paper we focus on a relatively novel neural network architecture and learning algorithm, bp-som, that ooers possibilities to overcome this diiculty. It is shown that networks trained with bp-som show interesting regularities, in that hidden-unit activations become restricted to discrete values, and th...
متن کامل